Search CORE

3 research outputs found

Document Collection Visualization and Clustering Using An Atom Metaphor for Display and Interaction

Author: Nghi Khanh V.
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/05/2013
Field of study

Visual Data Mining have proven to be of high value in exploratory data analysis and data mining because it provides an intuitive feedback on data analysis and support decision-making activities. Several visualization techniques have been developed for cluster discovery such as Grand Tour, HD-Eye, Star Coordinates, etc. They are very useful tool which are visualized in 2D or 3D; however, they have not simple for users who are not trained. This thesis proposes a new approach to build a 3D clustering visualization system for document clustering by using k-mean algorithm. A cluster will be represented by a neutron (centroid) and electrons (documents) which will keep a distance with neutron by force. Our approach employs quantified domain knowledge and explorative observation as prediction to map high dimensional data onto 3D space for revealing the relationship among documents. User can perform an intuitive visual assessment of the consistency of the cluster structure

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

Author: Bui Nghi D. Q.
Dau Anh T. V.
Guo Jin
Hai Nam Le
Manh Dung Nguyen
Nghiem Khanh
Nguyen Anh Minh
Publication venue
Publication date: 09/05/2023
Field of study

We present The Vault, an open-source, large-scale code-text dataset designed to enhance the training of code-focused large language models (LLMs). Existing open-source datasets for training code-based LLMs often face challenges in terms of size, quality (due to noisy signals), and format (only containing code function and text explanation pairings). The Vault overcomes these limitations by providing 40 million code-text pairs across 10 popular programming languages, thorough cleaning for 10+ prevalent issues, and various levels of code-text pairings, including class, function, and line levels. Researchers and practitioners can utilize The Vault for training diverse code-focused LLMs or incorporate the provided data cleaning methods and scripts to improve their datasets. By employing The Vault as the training dataset for code-centric LLMs, we anticipate significant advancements in code understanding and generation tasks, fostering progress in both artificial intelligence research and software development practices

arXiv.org e-Print Archive

Risk preferences and development revisited

Author: A Bruhin
A Tversky
A Tversky
AO Hopland
AS Booij
B Donkers
B Kőszegi
CA Holt
D Filmer
D Kahneman
D Prelec
E Diecidue
EM Liu
EM Liu
Ferdinand M. Vieider
FM Vieider
FM Vieider
FM Vieider
FM Vieider
FM Vieider
G Baltussen
G Feder
G Feder
GW Harrison
H Fehr-Duda
H Fehr-Duda
H Markowitz
H-M Gaudecker von
HP Binswanger
J Haushofer
J Pahlke
J Sydnor
JC Cardenas
JD Hey
K Train
M Abdellaoui
M Abdellaoui
M Bauer
M Yesuf
ME Yaari
MO Rieger
MR Rosenzweig
N Etchart-Vincent
Nghi Truong
O Attanasio
Peter Martinsson
Pham Khanh Nam
PP Wakker
R Croson
R Sugden
S Choi
S Choi
S Jayachandran
S Liebenehm
S Zeisberger
SJ Kachelmeier
T Dohmen
T Tanaka
U Schmidt
U Schmidt
V Köbberling
X Giné
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref